As we have discussed in class, many urban problems exacerbate the inequalities within an urban system. After over a year and a half of pandemic, Covid-19 still poses a threat to the population’s ability to health, travel, work, and gather, though some are more affected than others. It is no secret that Covid-19 is greatly affecting communities of color and impoverished communities, and these two groups have a lot of overlap. One such community of interest in examining the impact of Covid-19 on vulnerable populations is Oakland, because it contains zip codes with large differences in poverty levels as well as many racially segregated neighborhoods. Oakland is a high impact city to look at because of its large and diverse population. With over 433,000 residents, the city has a relatively equal split between White, Black, and Hispanic/Latino population with a strong Asian presence as well (source 4). In addition, the poverty rate is 16.7%, which is over 5% higher than the national average.
With Oakland’s demographic, Covid-19 has impacted populations in a disproportionate way. One example of this is seen by comparing the zip codes 94603 and 94618, which are on opposite sides of Oakland. Zip code 94603 has 30% of children living below the poverty level and the highest Covid rate in the city whereas zip code 94618 has 4% of children living below the poverty level and the lowest Covid rate in the city (source 1). Another key insight about this example is that zip code 94618 is 77% White while zip code 94603 is majority Black or African American (source 2). Furthermore, at the City of Oakland Racial Disparities Task Force Town Hall it was mentioned that in Alameda County, Latinx make up 22% of the general population and 46% of the COVID-19 caseload and African-Americans make up 10% of the general population and 14% of COVID-19 cases (source 3). These startling statistics clearly indicate further need to prioritize researching the impact of Covid-19 on the basis of both race and poverty. Local leaders in Oakland have announced how extremely worried they are about these racial disparities in people of color as well as low-income people, immigrants, people with disabilities, and others (source 5). There are many factors that could also be indicative of Covid-19 impact, risk, and response, so we were also curious to see how the numbers might also relate with covid testing rates, which could alert community members to infection and allow them to take the necessary precautions to avoid further spread. We plan on finding relationships between these variables as well as map them geographically to provide some insights on Covid-19 in Oakland as well as predict future Covid-19 rates. For this reason, we decided to use Census Data for income and race and ArcGIS Hub data that records Alameda County COVID-19 Cases and Case Rates over the past 28 Days by Zip Code (https://hub.arcgis.com/datasets/5d6bf4760af64db48b6d053e7569a47b_0/explore?location=37.679493%2C-121.905640%2C10.88, https://hub.arcgis.com/datasets/5d6bf4760af64db48b6d053e7569a47b/explore?layer=4&location=37.679103%2C-121.905640%2C10.88 ).
Below is a map of Alameda County (highlighted in blue). We highlighted Oakland as the area of interest with the red highlighting the different zipcodes in Oakland.
Our ultimate goal was to create a dashboard that makes it easy to view Oakland zipcodes in terms of the presence of a chosen income level, race, COVID testing rate, and COVID case rate. Some key insights we hoped to find were 1) how the amount of COVID testing correlates to covid cases (ie. does testing seem to actually have a strong relationship with covid case rate as many say), 2) does access to testing appear equal on the basis of racial or economic background (ie. what does accessibility seem like?) and 3) also evaluating the disparities in Oakland more generally. One issue to note was that due to having data only as granular as the zipcode level for Oakland, it was not statistically significant to only provide regression data for the variables based on Oakland zipcodes, so our overall analysis of the trends seen in Oakland through our graphs had to be supplemented with regression results for Alameda county, which does add a variable of inconsistency, but we still felt we were able to draw informative conclusions. Our reflections with specific instances of graphs we wanted to point out are presented below, but the link to our dashboard can be found at the top of this report.
COVID Dataset from Alameda County COVID-19 Case/Case Rates by Zip Code GeoJSON API URL: “https://opendata.arcgis.com/datasets/5d6bf4760af64db48b6d053e7569a47b_0.geojson”
COVID Dataset from Alameda County COVID-19 Test Rates by Zip Code GeoJSON API URL: “https://opendata.arcgis.com/datasets/5d6bf4760af64db48b6d053e7569a47b_4.geojson”
The covid case and testing rates are a running total of the past 28 days.
Our first major assumption is that the breakdown of race in this visual is representative of the breakdown of population in Oakland. Another assumption is the validity, completeness, and accuracy of the data set used. This data set is gathered and produced by the US Census, so we are assuming it is from a credible source on the topic and was gathered in a fair and unbiased way.
#Number of Black people in Oakland
Before beginning our regression analysis, we noticed that when trying to see the relationship of COVID cases versus COVID testing that several outliers were present and made the results less relevant, so we wanted to find which zipcodes caused these outliers and removed them to allow the results to be more accurate.
Below is a plot of all the Alameda zipcode data:
##
## Call:
## lm(formula = CaseRates ~ TestRates, data = alameda_grouping_by_zip1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6030.2 -2471.6 -298.1 1952.1 11263.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.784e+03 5.741e+02 10.074 1e-13 ***
## TestRates 1.601e-01 5.631e-02 2.842 0.00643 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3796 on 51 degrees of freedom
## Multiple R-squared: 0.1367, Adjusted R-squared: 0.1198
## F-statistic: 8.079 on 1 and 51 DF, p-value: 0.006426
From this we can identify there are some zipcodes with data points that deviate far from the rest of the data points, specifically 94720, 95377, 94621,94613 and 94603, which are significatly higher either in terms of test rate or case rate, so we will remove these frames from the regression to hopefully help improve it.
We now see that the graph looks better
##
## Call:
## lm(formula = CaseRates ~ TestRates, data = alameda_grouping_by_zip1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5390.5 -2214.7 -19.1 1527.7 7656.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8665.7717 1187.5665 7.297 3.27e-09 ***
## TestRates -0.9230 0.3856 -2.394 0.0208 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2794 on 46 degrees of freedom
## Multiple R-squared: 0.1108, Adjusted R-squared: 0.09143
## F-statistic: 5.73 on 1 and 46 DF, p-value: 0.02082
#Results As we can see, removing the outliers changed the trend of the graph, and so now we are ready for analysis. The linear regression appears to imply an association between Test Rates and Case Rates. More specifically, “An increase of Covid Testing Rates over the past 28 days by 1 unit is associated with an decrease of Covid Case Rates over the past 28 days by -0.92”. Furthermore, we see that the p-value is less than .05, so the results seem statistically significant, although the residuals are not centered around 0,bringing into question some of the validity of this correlation.
##
## Call:
## lm(formula = estimate ~ CaseRates, data = alameda_grouping_by_zip2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -460.98 -153.59 -56.13 110.23 1351.71
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.285586 43.098121 1.028 0.305
## CaseRates 0.053918 0.006474 8.328 1.6e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 260.2 on 190 degrees of freedom
## Multiple R-squared: 0.2674, Adjusted R-squared: 0.2636
## F-statistic: 69.36 on 1 and 190 DF, p-value: 1.596e-14
The linear regression appears to imply an association between the number of people making below $25,000 annually and Covid Case Rates over the past 28 days. More specifically, “An increase of the number of people making below 25,000 annually by 1 unit is associated with an increase of covid case rates over the past 28 days by .05. However, while the p-value seems to indicate statistical significance, we see that the residuals are not centered around 0,
##
## Call:
## lm(formula = estimate ~ TestRates, data = alameda_grouping_by_zip3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1459.8 -1102.0 -598.9 912.2 7147.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1647.4311 678.0908 2.430 0.0191 *
## TestRates -0.1096 0.2202 -0.498 0.6210
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1595 on 46 degrees of freedom
## Multiple R-squared: 0.005357, Adjusted R-squared: -0.01627
## F-statistic: 0.2477 on 1 and 46 DF, p-value: 0.621
##
## Call:
## lm(formula = estimate ~ TestRates, data = alameda_grouping_by_zip4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6209 -2178 -723 1280 9397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7791.7985 1485.3246 5.246 3.84e-06 ***
## TestRates -0.7022 0.4823 -1.456 0.152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3494 on 46 degrees of freedom
## Multiple R-squared: 0.04405, Adjusted R-squared: 0.02327
## F-statistic: 2.12 on 1 and 46 DF, p-value: 0.1522
As we can see from the model summary, in both cases the P-value and residuals imply the results are not statistically significant, which actually underlines some ambiguity on how much access to COVID tests is or isn’t a problem, at least in Alameda county. In the graph of Black population versus test rates, we see that while some zipcodes with more Black people had lower test rates, we also see areas with less Black people having both significantly more and less covid testing. For the White population, the spread seems fairly even , with both highs and lows in test rates in varying population counts. However, there are many caveats to this, and should not be to say that covid testing is or is not as accessible to a certain racial group, but rather that we need more data. Additionally, other factors could boost or lower testing numbers unrelated to race, like if a given zipcode simply has more or less people in it, which is a downside of our dataset providing raw number rather than a percentage.
Overall, it was nice to see that covid testing does appear to have a positive association with a decrease in covid cases, although the correlation did not seem particularly strong, and furthermore it was hard to make conclusions about the correlations between other racial and economic backgrounds and covid test accessibility and case rates. I think a large reason for this is because the association must be more loosely made, since we do not know which of the covid cases or tests were from people of different economic or racial groups. However, I still think there was knowledge to be gained from this project and results analysis by understanding the complexity and realizing areas where we should go deeper while also weighing other concerns that come with trying to get clearer data, like anonymity (for example, it could be quite problematic to some people to report their racial or economic identity along with a covid test or case report in a publicly provided dataset).
Sources: https://calmatters.org/health/coronavirus/2021/06/california-covid-inequality-oakland-rockridge/ https://www.unitedstateszipcodes.org/ https://www.accfb.org/how-covid-19-is-affecting-communities-of-color/ https://www.census.gov/quickfacts/oaklandcitycalifornia https://www.oaklandca.gov/news/2020/local-leaders-announce-covid-19-racial-disparities-task-force